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Abstract — When is optimal estimation linear? It is well known 
that, when a Gaussian source is contaminated with Gaussian 
noise, a linear estimator minimizes the mean square estimation 
error. This paper analyzes, more generally, the conditions for 
linearity of optimal estimators. Given a noise (or source) distri- 
bution, and a specified signal to noise ratio (SNR), we derive 
conditions for existence and uniqueness of a source (or noise) 
distribution for which the Lp optimal estimator is linear. We 
then show that, if the noise and source variances are equal, then 
the matching source must be distributed identically to the noise. 
Moreover, we prove that the Gaussian source-channel pair is 
unique in the sense that it is the only source-channel pair for 
which the mean square error (MSE) optimal estimator is linear 
at more than one SNR values. Further, we show the asymptotic 
linearity of MSE optimal estimators for low SNR if the channel 
is Gaussian regardless of the source and, vice versa, for high 
SNR if the source is Gaussian regardless of the channel. The 
extension to the vector case is also considered where besides the 
conditions inherited from the scalar case, additional constraints 
must be satisfied to ensure linearity of the optimal estimator. 

Index Terms — Optimal estimation, linear estimation. 



I. Introduction 

CONSIDER a basic problem in estimation theory, namely, 
source estimation from a signal received through a chan- 
nel with additive noise, given the statistics of both source and 
channel. The optimal estimator that minimizes the mean square 
error (MSE) is usually a nonlinear function of the observation. 
A frequently exploited result in estimation theory concerns 
the special case of Gaussian source and Gaussian noise, a 
case in which the MSE optimal estimator is guaranteed to 
be linear An open follow-up question considers the existence 
of other cases exhibiting such a "coincidence", and more 
generally the characterization of conditions for linearity of 
optimal estimators for general distortion measures. 

This problem also has practical importance beyond theo- 
retical interest, mainly due to significant complexity issues 
in both design and operation of estimators. Specifically, the 
optimal estimator generally involves entire probability distri- 
butions, whereas linear estimators require only up to second- 
order statistics for their design. Moreover, unlike the optimal 
estimator which can be an arbitrarily complex function that 
is difficult to implement, the linear estimator consists of a 
simple matrix-vector operation. Hence, linear estimators are 
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more prevalent in practice, despite their suboptimal perfor- 
mance in general. They also represent a significant temptation 
to "assume" that processes are Gaussian, sometimes despite 
overwhelming evidence to the contrary. Results in this paper 
identify the cases where a linear estimator is optimal, and 
when the use of linear estimators is justified in practice without 
recourse to complexity arguments. 

The estimation problem in general has been studied in- 
tensively in the literature |[l)-||6|. Our preliminary results 
appeared in |7J, [8|. It is known that, for stable distribution^ 
(which includes the Gaussian distribution as the only finite 
variance member), the optimal estimator is linear at all signal 
to noise ratios (SNR). Stable distributions are a subset of a 
family called infinitely divisible distributions which, as we 
show in this paper, satisfy the derived necessary conditions for 
the existence of a matching source/noise distribution such that 
the optimal estimator is linear at any SNR level. Our main 
contribution relative to prior work, which studied linearity 
as it applies simultaneously at all SNR levels, focuses on 
the linearity of optimal estimation for the Lp norm and its 
dependence on the SNR level. Specifically, we present the 
optimality conditions for linearity of optimal estimators at a 
specified SNR, where optimality is in the sense of the Lp 
norm. As an important special case, we investigate the p — 2 
case (mean square error) in detail. Note that a similar problem 
has been studied in | [T0[ for the special case of the mean 
square error, albeit without further study related to questions 
of existence and uniqueness of "matching" distributions. We 
show that the necessary conditions presented in Q, pO) are 
subsumed in our general necessary and sufficient conditions; 
and specify conditions for which such matching distributions 
exist and are unique. The analysis is then extended to the case 
of vector spaces. Interestingly, this extension is non-trivial and 
new constraints, beyond those inherited from the scalar case, 
must be satisfied to ensure linearity of optimal estimation. 

Five results are provided on the linearity of optimal estima- 
tion. First, we show that if a given noise (alternatively, a given 
source) distribution satisfies certain conditions, there always 
exists a matching source (alternatively, noise) distribution of 
a given power, for which the optimal estimator is linear. We 
further identify conditions under which such a matching dis- 
tribution does not exist. Secondly, we show that if the source 
and the noise have the same variance, they must be identically 
distributed to ensure the linearity of the optimal estimator 
Having established more general conditions for linearity of 
optimal estimation, one wonders in what precise sense the 
Gaussian case may be special. This question is answered by 

'A distribution is called stable if for independent identically distributed 
X\, X2, X; for any constants a. b; the random variable aXi + bXi has the 
same distribution as cX + d for some constants c and d (5). 
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Fig. 1. The general setup of the problem 

the third resuh. We consider the optimality of linear estimation 
at multiple SNR values. Let random variables X and Z be 
source and noise, respectively, and allow for scaling of either 
to produce varying levels of SNR. We show that if the optimal 
estimator is linear at more than one SNR value, then both the 
source X and the noise Z must be Gaussian. In other words, 
the Gaussian source-noise pair is unique in the sense that it 
offers linearity of optimal estimators at multiple SNR values 
(in fact the optimal estimator is linear at all SNR as is well 
known). As a fourth result, we show that the MSB optimal 
estimator converges to a linear estimator for any source and 
Gaussian noise at asymptotically low SNR, and vice versa, for 
any noise and Gaussian source at asymptotically high SNR. 

Finally, we analyze the vector case, where conditions for 
linearity of optimal estimation are more stringent. We show 
that for a vector source-channel pair with identical dimensions, 
the conditions derived for the scalar case become necessary 
conditions in a transform domain, where the transform jointly 
diagonalizes the source and channel covariance matrices. We 
further derive the additional, complementary conditions that 
must be satisfied to achieve sufficiency. 

The paper is organized as follows: we review optimal and 
linear estimation in Section II, present the main result in 
Section III, its main corollaries in Section IV, the vector case 
in Section V, and conclusions in Section VI. 

II. Review of Optimal and Linear Estimation 
A. Preliminaries and Notation 

Let M, M+, and N denote the respective sets of real num- 
bers, positive real numbers and natural numbers. In general, 
lowercase letters (e.g., x) denote scalars, boldface lowercase 
(e.g., x) vectors, uppercase (e.g., C/, X) matrices and random 
variables, and boldface uppercase (e.g., X) random vectors. 
Unless otherwise specified, vectors and random vectors have 
length m, and matrices have size m x m. The fc*'' element 
of vector x is denoted by [x]^ and the {i, j)-th element 
and the fc*'' column of the matrix U by \U\ij and [U]k 
respectively. U^^ denotes (t/-^)^^. E[-], Rx, and Rxz denote 
the expectation, covariance of X and cross covariance of X 
and Z respectively. V denotes the gradient and V^; denotes 
the partial gradient with respect to x. F'^^\-) denotes the k^^ 
order derivative of the function F{-), i.e., F^^^\x) = fjf"^ . 

We consider the problem of estimating source X given the 
observation Y — X + Z, where X and Z are independent, 
as shown in Figure 1. Let X and Z be scalar zero mearj^ 
random variables with respective densities fx{') and fz{-) and 
characteristic functions Fx{i^) and Fz{oj)- A density f{x) is 

-The zero mean assumption is not crucial, but it considerably simplies the 
notation. Therefore, it is kept thi'oughout the paper. 



said to be symmetric if it has an even characteristic functiorj^ 
f{x) = f{-x) Vx e E. The SNR is 7 = ^, where cr^ = 
E{X^} and = E{Z^}. In any statement concerning Lp 
norm, all random variables are assumed to have finite p*'' order 
moments, e.g., in any result associated with MSB we assume 
finite variances, ct^ < 00, tr^ < 00. All the logarithms in the 
paper are natural logarithms and may in general be complex. 

In the rest of this section, we review and derive some 
preliminary results concerning optimal estimators which will 
be useful in the following sections in proving our main results. 
An estimator h{-) is a function of the observation and is said 
to be optimal if it minimizes the cost functional 



J{h) =E{^{X,h{Y))} 



(1) 



for a given distortion measure $, which is assumed to be first 
order differentiable. Specializing ([TJ to a difference distortion 
measure, we explicitly get: 



^{x-h{v))fx{x)fz{v-x)dxdy (2) 



To obtain the necessary conditions for optimality, we apply 
the standard method in variational calculus | fTT[ : 

^-J{h + €ri) 



de 







(3) 



e=0 



for all variation functions ri{ ). Then, (jsj yields 

J J 'S>\x^h{y)Uy)fx{x)fz{y-x)dxdy = (4) 

or. 



E{[^'{X~h{Y)MY)} = 



(5) 



for all variation functions !]{■), where $' is the derivative of 



B. Optimality condition for Lp norm 

Hereafter, we will specialize to the case of the Lp metric 
with p ~ 2p, p € N, i.e., <i>(a;) = Jx|p for everj^and natural p. 



Using the fact that 
the necessary condition for optimality of an estimator as : 



dx I 



pi^,Vx G M - {0}, we derive 



E.{[X -h{Y)Y-^i-l{Y)] =0 



(6) 



Note that for p = 2, or <i>(a;) 



, this condition reduces 
to the well known orthogonality condition of MSB, i.e., the 
following holds : 



E{[{X-h{YMY)} = 



(7) 



for any ?7(-) function. The MSB optimal estimator h{Y) = 
E{X|F} can be directly obtained from (j7|l. The following 
lemma formally states that the above necessary condition, (|6|, 
is also sufficient for minimizing Lp norm. 



^^Note that this definition requires generalization to symmetry about the 
mean when one drops the assumption of zero-mean random variables. 

■* Although some of the high level results may be derived for all natural p, 
in this paper we focus on even p which enables considerable simplification 
of the results, hence providing much insight and clear intuitive interpretation 
of the solution. 
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Lemma 1. The necessary condition stated in ^ is sufficient. 
Moreover, the optimal estimator is unique almost everywhere 
( optimal estimators may only differ over a set of zero measure). 

Proof: See Appendix |A] ■ 



C. Lp Optimal Linear Estimation 

To derive the optimal linear estimator, the variation function 
•qiy) must be made linear to ensure linearity of h{y) + erj{y). 
Plugging h{Y) ~ kY and 7]{Y) — oY (for some a G M) in 
(j6]l and omitting straightforward steps, we obtain the condition 
for optimal linear estimation to be: 



V. {{X ^ kYf-^Y] =0 



(8) 



The optimal scaling coefficient k can be found by plugging 
Y = X -\- Z into ([8]). Observe that for p = 2, we get the well 
known result k = 

7+i 

D. Gaussian Source and Channel 

We next consider the special case in which both X and Z 
are Gaussian, X - M{Q,al) and Z - Af{0,(jl). The linear 
estimator 



h{Y) = 



7 



7 + 1" 



-Y 



(9) 



is well known to be the optimal MSE estimator. A relatively 
less known fact is that this linear estimator is optimal more 
generally for the Lp norm fV2\. It is straightforward to 
show that this linear estimator satisfies (j6]l by rendering the 
reconstruction error X — h{Y) independent of Y . 

III. Conditions for Linearity of Optimal 
Estimation 

In this section, we find the necessary and sufficient condi- 
tions in terms of characteristic functions Fxi^^) and Fz{lS) 
that ensure that h{Y) = kY is the optimal estimator for some 
fc e M. We first provide the result for the Lp norm, which 
takes the form of a differential equation that must be satisfied 
to ensure hnearity of optimal estimation, and then specialize 
it to the MSE case. 



B. Specializing to MSE: The Matching Condition 

In this section, we explore the impact of Theorem [T| for 
the special case of the mean square error distortion metric, 
i.e., p = 2. More precisely, we wish to find the entire set 
of source and channel distributions such that h{Y) ~ ^:i~yi^ 
is the optimal estimator for a given SNR, 7. Note that this 
condition was derived, in another context Q, fTO) , albeit with- 
out consideration of important implications which we focus 
on, including the conditions for existence and uniqueness of 
matching distributions. Specifically, we identify the conditions 
for existence and uniqueness of a source distribution that 
matches the noise (and vice versa) in a way that guarantees 
the linearity of the optimal estimator. We state the main result 
for MSE in the following theorem. 

Theorem 2. Given SNR level 7, and noise Z with character- 
istic function Fz{uj), there exists a source X for which the 
optimal estimator is linear if and only if the function 

F{u)^Fl{Lo) 

is a legitimate characteristic function. Moreover, if F{uj) is 
legitimate, then it is the characteristic function of the matching 
source, i.e., Fx{i^) — F{uj). 

(An equivalent theorem holds where we replace "noise" 
for "source" everywhere, i.e., given source and SNR level, we 
have a condition for existence of a matching noise.) 



Proof: Plugging p — 2 and k 

1 dFxicj) 



1 



m 



= 7 



7+1 
1 dFz{Lo) 



10 1 yields 



Fz{ijj) duj 



or more compactly, 

log Fx (w) = 7^^ logFz(w) 
The solution to this differential equation is given by: 

log Fx (w) = 7logFz(a;) + C 

where C is a constant. Imposing Fz{0) ~ Fx(0) = 
obtain C — 0, which implies: 

FxM-F^M 



(11) 

(12) 

(13) 
1, we 

(14) 



A. Lp Norm Condition 

As stated previously for any Lp norm result, the character- 
istic functions of the source and noise Fx(w) and Fz{uj) are 
assumed to be p*'* order differentiable. 

Theorem 1. Given an Lp distortion measure, source X 
and noise Z with characteristic functions Fx{uj) and Fz{uj) 
respectively, the optimal estimator is linear, h{Y) ~ kY, 
where Y — X + Z, if and only if the following differential 
equation is satisfied: 



p-i 

E 



m— 



p-i 



^(m) 
X 



(^)F 



(p— 1—771 ) 



k-l 



= (10) 



Proof: See Appendix [B] 



Hence, given a noise distribution, the necessary and suffi- 
cient condition for the existence of a matching source distri- 
bution boils down to the requirement that F'^{uj) be a valid 
characteristic function. Moreover, if such a matching source 
exists, we have a recipe for deriving its distribution. 

C. Existence of a Matching Source for a Given Noise 

In this section, we study the conditions under which a 
matching source exists for a given noise distribution. During 
the course, we also study some important properties relating 
the matching distributions when they exist. 

We begin with Bochner's theorem which states that 
a continuous function F : M ^ C with F(0) = 1 is a 
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valid characteristic function if and only if it is positive semi- 
t/e^«!fe|^ Hence, the existence of a matching source depends 
on the positive semi-definiteness of F'^{uj). 

We note that characterizing the entire set of Fz{oj) where 
F2{uj) is positive semi-definite is a long-standing open prob- 
lem. Instead we illustrate the result with various cases of 
interest where F'^{uj) is, or is not, positive semi-definite. Let 
us start with a simple but useful case. 

Corollary 1. If SNR 7 G N, a matching source distribution 
exists, regardless of the noise distribution. 



Proof: From (14 1, natural 7 implies: 



(15) 



where Zi are independent and distributed identically to Z. 
Hence, F'^{ijj) is a valid characteristic function and a matching 
X exists. ■ 
Next, we recall the concept of infinite divisibility, which is 
closely related to the problem at hand. 

Definition [13|: A distribution with characteristic function 
F{lli) is called infinitely divisible, if for each integer fc > 1, 
there exists a characteristic function Fk{uj) such that 



F{uj) = Ftiuj) 



(16) 



Alternatively, fx{-) is infinitely divisible if and only if the 
random variable X can be written for any fc as X = X^iLi Xi 
where {Xi,i ~ l,...,fc} are independent and identically 
distributed. 

Infinitely divisible distributions have been studied exten- 
sively in probability theory 1 13|, 1 14|. It is known that Poisson, 
exponential, and geometric distributions as well as the set of 
stable distributions (which includes the Gaussian distribution) 
are infinitely divisible. On the other hand, it is easy to see that 
distributions of discrete random variables with finite alphabets 
are not infinitely divisible. 

Corollary 2. A matching source distribution exists for all 7 € 
if and only if fz{') is infinitely divisible. 



Proof We first note that if fz{ ) is infinitely divisible, 
Fy\ijj) is a valid characteristic function for all natural j, 
as follows directly from the definition of infinite divisibility. 
Then, by Corollary [ij it follows that F^J^lu) is also a valid 
characteristic function, which implies that so is F^{uj) for all 
positive rational r > since a rational r means that r = i/j 
for some natural i and j. Using the fact that every 7 e M+ 

'Let / : IR — > C be a complex-valued function, and ti, ts be a set of 
points in M. Then / is said to be positive semi-definite (non-negative definite) 
if for any S K and aiSC,i = l,...,swe have 

s s 

^^aiaj*/(ti -tj)>0 

where aj* is the complex conjugate of aj. Equivalently, we require that the 
s X s matrix constructed with f{ti — tj) be positive semi-definite. If function 
/ is positive semi-definite, its Fourier transform, is non-negative everywhere 
_F(a;) > OjVtiJ £ R. Hence, in the case of our candidate characteristic 
function, this requirement ensures that the corresponding density is indeed 
non-negative everywhere. 



is a limit of a sequence of rational numbers r„, and by the 
continuity theorem |5|, we conclude that Fx{oj) = F].{oj) is 
a valid characteristic function, and hence a matching source 
exists. 

Towards showing the converse, note that if Fx{i^) = F].{io) 
is a valid characteristic function for all 7, then fz{-) has to 
be infinitely divisible, because we can always choose 7 = 
for fc e N and set Fk{uj) = Fx{uj) in (fT6|. ■ 

However, note that at a given SNR, there may exist a 
matching source, even though fz(-) is not infinitely divisible. 
For example, a finite alphabet discrete random variable V 
is not infinitely divisible but still can be fc-divisible, where 
fc < |y| — 1 and \V\ is the cardinality of V. Hence, when 
7 = j;, there may exist a matching source, even when the 
noise distribution is not infinitely divisible. Many examples 
follow directly from Corollary [T] 

We next cite a theorem, regarding analytic characteristic 
functions, which will be useful in the proofs that follow. 

Theorem | |T3) : A characteristic function F{uj) is analytic // 
and only if F has finite moments of all orders and there exists a 
finite (3 such that EjlX*^!} < k\/3^,yk e N. This requirement 
is equivalent to the existence of a moment generating function. 
A characteristic function F{uj) is analytic ;/ and only if 
the moments EHX*^!} uniquely characterize the distribution, 
which in general is not the case, see eg. |15|. 

A useful property of the matching pair, relating the ana- 
lyticities of their characteristic functions is captured by the 
following corollary. 

Corollary 3. If Fz{lo) (or Fxiu:)) is analytic, then the 
matching Fxi^j) (or Fz{uj)), if it exists, is analytic. 

Proof Recall the orthogonality property of the MSE 
optimal estimator (|7|. Let 77(F) = for m = 1,2, 3... Af. 
Plugging the best linear estimator h{Y) = —^Y and replac- 



ing Y with X + Z, we obtain the condition 



E 



X 



7 



-iX + Z) 



{X + zy 



7 + 1 

Applying the binomial expansion 



for m = 1, 



{X + zy 



E 

1=0 



...M 

(17) 

(18) 



and rearranging the terms, we obtain M linear equations that 
recursively relate the M + \ moments of X, i.e., for m — 
1 , . . . , Af we have 

m— 1 

E(X™+i) = 7E(Z™+i) + ^ yl(7,m,i)E(Z*+i)E(X"-0 

i=0 

(19) 

where, ^(7, m, z) = 7(7) - G"\). 

Note that if Fz{io) is analytic, Z has finite moments of all 
orders and EjlZ*^!} < kl/3'', Vfc. From (19i, by induction, we 



can show that all moments of X exist and are bounded by 
E{|X''|} < fc!(max{7, 1};3)''. This condition is sufficient to 
show that X also has an analytic characteristic function. ■ 
The following corollary identifies a case in which a match- 
ing source does not exist. 
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Corollary 4. For 7 ^N, if Fz{uj) is real and analytic and it 
is negative somewhere, i.e., 3uj such that Fz{uj) < 0, then a 
matching source distribution does not exist. 

Proof: We prove this corollary by contradiction. Let 
Fz{oj) be a valid characteristic function. Let us first assume 
that a matching source, X, exists. Hence, from Corollary [3] it 
follows that X must have an analytic characteristic function, 
Fx(cj). We will show that this leads to a contradiction. Recall 



the set of moment equations ( 19 1. It follows by induction over 
the set of moment equations starting from m — 1 that, if all 
odd moments of Z are zero, then so are aU odd moments of 
X. As the noise is symmetric, it follows from analyticity of 
Fx (to) that the matching source must also be symmetric, since 
moments of X fully characterize its distribution. 

However, if 7 <^ N, by (14i, it follows that Fxiuj) is 
not real everywhere, and hence fx{-) is not symmetric. 
This contradiction shows that no matching source exists for 
symmetric noise distributions which are non positive semi- 
definite when 7 N. ■ 

Let us provide a commonly used example distribution to 
which the above corollary applies: uniform distribution over 
[—a, a]. In this case, fz{-) is symmetric with an analytic 
characteristic function, but it is not positive semi-definite. The 
corollary states that, except for natural values of SNR, the 
optimal estimator is strictly nonlinear for an additive uniform 
channel. Example 1 illustrates this point with a numerical 
example. 

Remark: As an important application, consider high reso- 
lution quantization theory. Standard high resolution approxi- 
mations assume quantization noise independent of (or uncor- 
rected with) the source |16|. In practice, such approximations 



can be made explicit by using a dithered quantizer (17| 



that generates quantization error independent of the source. 
Then, the quantizer is equivalent to an additive uniform noise 
channel. The corollary states that, other than for natural values 
of SNR, a linear decoder (e.g., a Wiener filter at the decoder) 
is strictly suboptimal for sources encoded at high resolution 
or by dithered quantization. 

D. Uniqueness of a Matching Source for a Given Noise 



Note that ( 14 1 may have multiple solutions due to multiplic- 



ity of complex roots. The following corollary establishes that 
for a large set of source (or noise) distributions, the matching 
noise (or source) is unique. 

Corollary 5. // Fz{ijj) (or Fx{i^)) is analytic, then the 
matching Fx{i^) (or Fz{uj)) is unique. 

Proof: We prove this corollary from the set of moment 
equations ( [T9| . Note that every equation introduces a new 
variable E(X'™+^), for m — 1,..,M, so each new equation 
is linearly independent of its predecessors. Let us consider 
solving these equations recursively, starting from m — 1. 
At each m, we have one unknown (E(X™+^)) in a "linear" 
equation. Since the number of equations is equal to the number 
of unknowns for each m, and the equations are linear in terms 
of the unknown, there must exist a unique moment sequence 
that solves (19 1. From Corollary |3] it also follows that X 



has an analytic characteristic function. Hence, the moment 
sequence fully characterizes X and the matching source X 
(if exists) is unique. ■ 

IV. Implications of the Linearity Conditions 

In this section, we explore some special cases obtained by 
varying 7 and utilizing the matching conditions for MSB and 
Lp. We start with a simple but perhaps surprising result. 

Theorem 3. Given a source and noise of equal variance, the 
Lp optimal estimator is linear if and only if the noise and 
source distributions are identical, i.e., fx{x) = fz{x), Vx £ 
M and in which case, the optimal estimator is h(Y) = ^Y. 



Proof: For MSB, it is straightforward to see from (14 1 
that, at 7 = 1, the characteristic functions must be iden- 
tical. Since the characteristic function uniquely determines 
the distribution |5|, fx{x) = fz{x), Vx G M. In fact, this 
results applies more generally. This can be observed directly 
from Theorem [1] that Fz{uj) — -Fx(w) satisfies the necessary 
and sufficient optimality condition, and hence this result also 
applies to the Lp norm distortion measure. ■ 

Our next result pertains to the speciality of Gaussian distri- 
bution in the context of linearity of optimal estimation. It is 
well known that linearity of optimal estimation for all SNR 
levels characterizes the stable family of distributions, which 
includes Gaussian as the only finite variance member |jT], |j2], 
@> pS] , |19|. However, all prior results on characterizing 
Gaussian density using linearity of optimal estimation consider 
optimal estimation for all SNR levels, 7 £ M+. 

Let us consider a setup with given source and noise variables 
which may be scaled to vary the SNR, 7. Can the optimal 
estimator be linear at multiple values of 7? This question 
is motivated by the practical setting where 7 is not known 
in advance or may vary (e.g., in the design stage of a 
communication system). It is well-known that the Gaussian 
source-Gaussian noise pair makes the optimal estimator linear 
at all 7 levels. Below, we show that this is the only source- 
channel pair whose optimal estimators are linear at more than 
one SNR value. 

Theorem 4. Let the source or channel variables be scaled 
to vary the SNR, 7. The MSB optimal estimator is linear at 
two different SNR values 71 and 72, if and only if source and 
noise are both Gaussian. Moreover, this claim also holds for 
Lp norm if the source (or noise) has an analytic characteristic 
function. 

Proof: Let Zi and Z2 denote the noise random vari- 
ables with variances cr^^ , cr^^ and characteristic functions 
Fz^{uj),Fz^{i^) respectively. Let us say the noise is scaled 
by a e M, i.e., Z2 — a.Z\ and hence Fz^ (w) = Fz^ (ujol) and 
<^lo = "^cr?, ■ Let, 



7i 



72 



^1 



Using ( 14 1, 



Fx{io)=Fl\{^)^Fx{^) = Fl\{Loa) 



(20) 



(21) 
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(a) SNR=0.1 (b) SNR=1 (c) SNR=10 

Fig. 2. This figure shows the optimal estimator at various SNR values when X ~ A^(0, 1) and Z is distributed uniformly on the interval [—a, a]. The SNR 
is varied by changing a. Observe that the optimal estimator converges to linear as SNR increases. 



Hence, 



(22) 



Taking the logarithm on both sides of ( [22| , applying ( 20 1 and 
rearranging terms, we obtain 



log Fz, (aw) 



(23) 



log Fz, (w) 

Note that ( |23] l should be satisfied for both a and —a since 
they yield the same 7. Hence, Fz-^{auj) = Fz-^{—aLu) for all 
a e M, which implies Fz^iuj) — Fzi{—uj), Vw e M. Using 
the fact that the characteristic function is conjugate symmetric 
(i.e., Fzi(-w) = F|^(w)), we get FzA^) G M, Vcj. As 
log Fzi (w) is a function from M ^ C, Weierstrass theorem 
||20l guarantees that there is a sequence of polynomials that 
uniformly converges to it: log Fz^iuj) — YlTLo^i^^^ where 
ki £ C. Hence, by ([23]l we obtain: 



a 



00 
1=0 



(24) 



which is satisfied for all uj only if all coefficients ki vanish, 
except for ^2, i.e., logFz^iuj) — ^20;^, or logi^2i(i^) = 
Vw S M (the solution a = 1 is of no interest). The 
latter is not a characteristic function, and the former is the 
Gaussian characteristic function, Fz^ (w) = e*"'^" , where we 
use the established fact that Fz^ (w) G M. Since a characteristic 
function determines the distribution uniquely, the Gaussian 
source and noise must be the only such pair 

Next, we extend the result to the Lp norm, albeit we require 
analyticity of the characteristic function of X (or Zi and Z2). 
Then, due to Corollary |3] matching noises Zi and Z2 also have 
analytic characteristic functions and hence the moments of 
X, Z\ and Z2 are finite (they have moments of all orders) and 
moments fully characterize the distribution. The extension to 
ip requires a different approach. For simplicity, we first derive 
the result for MSB (now with analyticity imposed) and then 
extend the arguments to the Lp case. The following relation 
between the moments of the original and scaled noise should 
be satisfied: 



Also, a set of moment equations should hold for two SNR 
values, 7i and 72. Let us consider the set of moment equations 
with moments up to M: 



E(X'™+i) = 7,E(Z;"+i)+^ A(7,,m,z)E(Z;+i)E(X"-^) 

(26) 

where m = 1, .., Af, j = 1, 2 and ^(7, m, i) = 7(7) - {^^. 
Similar to the proof of Corollary |5] we note that every equation 
introduces a new variable E(X™+-'^), form = 1,..,M, so 
each new equation is independent of its predecessors. Next, we 
solve these equations recursively, starting from m = 1. At each 
TO, we have three unknowns (E(X"+i), E(Z™+i), E(Z™+i)) 
that are related "linearly". Since the number of linearly inde- 
pendent equations is equal to the number of unknowns for 
each TO, there must exist a unique solution. We know that 
the moment sequences of the Gaussian source-channel pair 



E(^2™) 



"E(Z{") for TO = l,..,Af + 1 



(25) 



satisfy (26i since it ensures linearity of optimal estimation. 
The moment sequence of a Gaussian satisfies Carleman's 
general criterion |15| and therefore it uniquely determines the 
corresponding distribution, so the Gaussian source and noise 
pair is the only solution to ( [26| ). 

The proof for hp norm follows the same lines. Note that as 
mentioned in Sec II.D, the same linear estimator is hp optimal 
for a Gaussian source-channel pair Plugging Y ~ X+Z in the 
optimality condition with Lp norm, (|6]l, we reach a similar set 
of moment equations. Following similar arguments, we show 
that this result holds for the Lp norm. ■ 

Next, we investigate the asymptotic behavior of optimal 
estimation at low and high SNR. The results of our asymptotic 
analysis are of practical importance since they justify the use 
of linear estimators without recourse to complexity arguments 
at high and low asymptotic SNR regimes, under certain 
conditions. 

Theorem 5 (for MSB only). In the limit 7-^0, the MSE 

optimal estimator is asymptotically linear if the channel is 
Gaussian, regardless of the source. Similarly, as j ^ 00, the 
MSE optimal estimator is asymptotically linear if the source 
is Gaussian, regardless of the channel. 

Proof: We will present a sketch of the proof here, while 
a more rigorous formal proof is presented in Appendix C. The 
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Fig. 3. This figure sliows tlie variation of estimation error witli the channel 
SNR when X ~ M{0, 1) and Z is distributed uniformly on the interval 
[—a, a]. We observe that the error is significant at 7 = 0.1 and vanishes at 
high SNRs. 



proof follows from applying the central limit theorem ||3) to the 
matching condition ( 14 1. The central limit theorem states that 
oo, for any finite variance noise Z, the characteristic 



as 7 

function of the matching source F^iuj) pointwise converges to 
the Gaussian characteristic function. Hence, at asymptotically 
high SNR, any noise distribution is matched by the Gaussian 
source. ^ 

Similarly, as 7 — > and for any Fx{(^), (w) converges 
pointwise to the Gaussian characteristic function and hence the 
MSE optimal estimator is asymptotically linear if the channel 
is Gaussian. 

■ 

Example 1: Let us consider a numerical example that 
illustrates our findings. Consider a setting where X is Gaussian 
with unit variance, i.e., X ^ A/'(0, 1) and Z is distributed 
uniformly on the interval [—a, a]. Note that this is a typical 
setting for high rate or dithered quantization of a Gaussian 
source, in the sense that the quantization error is uniform and 
independent of the source. We change 7 (SNR) by varying 
a and observe how the optimal estimator {h{Y) = E{X|y}) 
and associated estimation error (E{(X — /i(y))^}) behaves for 
different 7. We numerically calculated the optimal estimator 
and the estimation error by discretizing the integrals on a 
uniform grid, with a step size A — 0.01, i.e., to obtain the 
numerical results, we approximated the integrals as Riemann 
sums. Figure 2 shows how the optimal estimator converges 
to linear as SNR increases. Note that at 7 = 0.1, optimal 
estimator is highly nonlinear while at 7 = 10, it practically 
converges to a linear one. Figure 3 demonstrates how the 
estimation error varies with SNR. As theoretically expected 
(and from Figure 2), we see a significant difference at 7 = 0.1, 
while difference vanishes at high SNRs. 

V. Extension to Vector Spaces 

Extension of the conditions to the vector case is nontrivial 
due to the dependencies across components of the source and 



noise. In this section, for simplicity, we restrict ourselves to 
the MSE distortion measure. We first give the formal definition 
of the problem: 

We consider the problem of estimating the vector source 
X £ M'" given the observation Y ~ X + Z, where X and 
Z £ M™ are independent, as shown in Figure 1. Without 
loss of generality, we assume that X and Z are zero mean 
random variables with rn-fold distributions /a'(-) and fz{')- 
Their respective characteristic functions are denoted 
and Fz[(^). Rx = E{XX^}, Rz = E{ZZ^} ai-e the 
covariance matrices of X and Z, respectively. Let Q be the 
eigenmatrix of RxRz^, and U = Q^^ and let eigenvalues 
Ai, A„i be the elements of the diagonal matrix A, i.e., the 
following holds: 



RxRz^ 



U-^AU 



(27) 



We are looking for the conditions on Fx(<^) and Fz{u)) 
such that h{Y) = KY with K = Rx{Rx + Rz)~^ 
minimizes the estimation error E{||X — 

By following a similar approach (details are in Appendix 
p) to the scalar case we obtain the necessary and sufficient 
condition of optimality: 



UVlogFxiu}) ^ AUVlogFziu}) 



(28) 



We will make use of the following auxiliary lemma from 
matrix analysis. 



Lemma 2. Given a function f 
and vector x £ M™ 



I, matrix A G 



VJ{Ax) = A^VfiAx) 

Proof: See Appendix |E] 
Next, we state the main theorem in vector settings. 



(29) 



Theorem 6. Let the characteristic functions of the transformed 
source and noise (UX and UZ) be Fjjx{'^) and Fjjzi^)- 
The necessary and sufficient condition for linearity of optimal 
estimation is: 

d\og Fux{<^) , d\og Fuz{i^) . ^ ■ ^ ..r.. 
^ = A,; ,l<i<m (30) 

OUi OU!i 

Proof Let us define uj = {U^^)u3, hence = U^Cj. 
Plugging this in ( [28| ), we have 

UVu^a,^ogFx{U'^Cj) = kUV uTa,\ogFz{U^ Cj) (31) 

Using Lemma |2] we can rewrite ( [3T| ) as 

Vc:,\ogFx{U^CJ)^kVc,\ogFz{U^Cb) (32) 

Note that the characteristic functions of the source and 
noise after transformation can be written in terms of the 
known characteristic functions Fx{^) and Fz{i^), specifically 
Fux{^) = FxiU^i^) and FuzH = Fz{U^u:). Plugging 
these expressions in ( [32] i, we have 

(33) 

Using the fact that A is diagonal, we convert ( |33] l to the set 
of m scalar differential equations of (|30]l. 



V c:, log Fu X i^) A'^u, log Fuz{<^) 
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Further insight into the above necessary and sufficient 
condition is provided via the following corollaries. 

Corollary 6. Let Fpx\i{^) <^nd F^uzjii'-^) be the marginal 
characteristic functions of the transform coefficients [UX]i 
and [UZ] i respectively. A necessary condition for linearity of 
optimal estimation is: 

F^ux], (w) = {^)A<i<m (34) 

Proof: The marginal characteristic functions of [t/X]; 
and \U Z]i are obtained by setting ujk — O^^k ^ i in Fjjx{i^) 
and Fuz{^) respectively. By setting aj^ = 0, Vfc ^ i in both 
sides of ([30|, we have 



91ogF[[;x],(w) 51ogF[c72]^(a;) 

i — = \ i — ■ , l<i<m (35) 

OOJ OU! 

The solution to this differential equation is given by: 

log F[uxu (^) = -^^ log Piuz], (t^) + C (36) 

where C is a constant. Imposing F[uz]i{^) = F[ux]i{^) = 1^ 
we obtain C = 0, which implies: 



(37) 



Corollary 7. A necessary condition for linearity of optimal 
estimation is that one of the following holds for every pair 

• i) K — Aj 

• ii) [UX]i is independent of [UX]j and [UZ]i is inde- 
pendent of [UZ]j. 



Proof: Let us rewrite (30 1 explicitly for the i*^ and j*'* 
coefficients. 

dlogFuxii^) , dhgFuzii^) 

— A,; {iO) 



duji 



duji 



d\og Fuxii^) . dlog Fuz{i^ 



A, 



(39) 



The partial derivative of both sides of (38 i with respect to 
ujj and both sides of (39i with respect to uji, to obtain the 
following: 

log Fux (i^) ^ ^ logFuzjuj) ^^^^ 



dojiduji 



dujiduji 



dHog Fux M _^ dHog FuzM ^^^^ 



dujiduj-i 



There are only two ways to simultaneously satisfy ( [40] i and 

.e., 

(42) 



(41 1: i) Xi = \j ii) the second order derivatives vanish, i.e., 

52 log Fux{<^) 



duJiduJi 



\ogFijz{<^) 
dojjdoji 











(43) 



Let us focus on X i.e., ( |42| ), derivation for Z follows simi- 
larly. F[ux]ij (i^i, Wj), i.e., the marginal characteristic function 



of the pair ([J7X]i, [UX]j) is obtained by setting Wk — 
Vfc 7^ i,j. Then, ( [42| implies 

92 logF[uxu,i^^t,i^] 



dujiduJi 







(44) 



which means 

log Fpx],, ^ A{uJi) + B{ujj) (45) 

for some functions A and B, i.e., \ogFpx]ij{'^ii^j) is 
additively separable in terms of and LOj. This implies 

F^ux],, = C(uj,)D{ujj) (46) 

for some functions C and D. But (|46]) implies independence 
of the z*'' and j*'' transform coefficients of source X. The 
independence of the i*'' and j*'* transform coefficients of the 
noise Z follows from similar arguments. ■ 

Corollary 8. // the necessary condition of Corollary |6| is 
satisfied, then a sufficient condition for linearity of optimal 
estimation is that U generates independent coefficients for both 
X and Z. 

Proof Independence of the transform coefficients implies 
that the joint characteristic function is the product of the 
marginals: 

m m 

Fux{u}) = X{Fpx]Awi), Fuziuj) ^Y[F[uzui^^) (47) 



Plugging (B7i into the necessary and sufficient condition (30 1 
of Theorem 6] it is straightforward to show that (34i, the 



necessary condition of Corollary |6] is now both necessary and 
sufficient. ■ 
While the condition in Corollary |8]involves independence of 
transform coefficients, the weaker property of uncorrelatedness 
is already guaranteed by transform U. The matrix U diago- 
nalizes both Rx and Rz- We formalize this in the following 
lemma: 

Lemma 3. Transform U decorrelates both source and noise: 
both U RxU'^ and URzU^ are diagonal matrices. 

Proof: Since both Rx and Rz are, by definition, positive 
definite matrices, there exists a matrix S that simultaneously 
diagonalizes Rx and whitens Rz, i.e., SRxS^ = Ax and 
SRzS^ = I where Ax is diagonal and / is the identity matrix 



1 21 1. Hence, Rx and Rz can be expressed as the following: 

Rx = S-'^AxS-'^, Rz ^ S-^S-'^ (48) 



Plugging (48 1 into (27 1 we obtain U = AjjS, where Au is 
diagonal. Substituting U in URxU^ and URzU^, we obtain: 



URxU^ = AuAxAJj, URzU 



(49) 



The product of diagonal matrices is also diagonal. ■ 
As an example where the optimal estimator is known to 
be linear, consider the multivariate Gaussian case. Note that 
the Gaussian source-channel pair satisfies the scalar matching 



condition for any SNR, i.e., (37i. As any linear transform 
preserves joint Gaussianity in the transform domain, U gener- 
ates jointly Gaussian and uncorrected coefficients which are 
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therefore independent, satisfying the conditions of Corollary 

m 

Another, perhaps surprising, example where the optimal 
estimator is linear involves identically distributed source X 
and noise Z. In this case, the linear estimator is optimal 
irrespective of the distribution of source and noise. It is 
straightforward to show that the necessary and sufficient 
conditions of Theorem 6 are satisfied if Fx(a;) = Fz{uj)- 

Example 2: Let us consider a numerical example that 
highlights the differences in conditions derived for vectors 
from the scalars. Consider a setting where a two dimensional 
random variable Z' has independent components, both 
of which are uniformly distributed over [—a, a], i.e., 
Z' = [Z'^.Z'^ and Z( - - U{-a,a\. Also, let X' 
have two independent identically distributed components 
X' = [X[,X2] where X[ and X2 are distributed according 
to a density given by the convolution of the uniform density 
with itself, i.e., X[ ^ X^ ^ {U[-a, a] U[~a, a]). Since 
X' and Z' satisfy the sufficient conditions in Corollary 8, 
the optimal estimator is linear for the source-channel pair 
iX',Z'). 

Let us next consider the source-channel pair {X,Z) to 
be X = QxX' and Z — QzZ' where Qx and Qz are 
2x2 orthogonal matrices {QxQ^x = QzQ^ = I)- This 
introduces dependencies among the components of X and Z. 
We already saw that for Qx = Qz — I, the optimal estimator 
is linear Also, from standard linear estimation principles p2j , 
it follows that the minimum estimation error achievable by 
linear estimators does not depend on Qx and Qz, i.e., linear 
estimation error is a constant with respect to Qx and Qz- 
The question we are interested in is - can the linear estimator 
be optimal for any other pair {Qx,Qz)'^ Corollary |8] sheds 
light on this question. First, we consider the case where 
Qx ~ iQz- Observe that, any orthogonal matrix U satisfies 



condition (27i. Hence, we can set U — Q-J' = iQz^ leading 



to UX = X' and UZ = Z'. This implies that UX and 
UZ satisfy conditions in Corollary 8, which are sufficient to 
prove linearity of optimal estimators. Hence, for the source- 
channel pair {X,Z), optimal estimators are always linear if 
Qx = ±Qz- 

Finally, we consider the case where Qx 7^ iQz- In general, 
any orthogonal matrix can be written in terms of another 
orthogonal matrix as 



Qx = G{e)Qz 



(50) 



where G{9) 



(also known as Givens 



cos(6l) - sin(6') 
sin(6l) cos(6') 
rotation |21 1). For a constant Qz, we change Qx by varying 
6 and observe the behavior of the difference between the mean 
square errors obtained by the optimal and the linear estima- 
tors. As a performance metric, we consider the normalized 
difference of estimation errors, i.e., (MSB of linear estimation- 
MSB of optimal estimation)/ MSB of optimal estimation. The 
variation of the normalized difference as a function of 6 is 
plotted in Figure 4. Observe that, at 6 — and vr the optimal 
estimator is linear as expected from Corollary [8] It is not hard 
to show using symmetry of X' and Z' that the conditions 




Fig. 4. Normalized difference between optimal and linear estimation versus 
the Givens rotation parameter 8, for the source channel parr {X, Z). 



of Corollary [s] are also satisfied for 6 — it/ 2 (and 37r/2). A 
perhaps interesting observation is that the deviation of optimal 
estimator from linearity grows monotonically in 6 in the range 

e e (0,7r/4). 

An important observation is that the necessary and sufficient 



condition for scalars (14i is also a necessary condition for 
vectors p4l ), in the transform domain. Due to this fact, it is 
straightforward to extend the existence and uniqueness results 
and implications of the scalar matching conditions to the 
vector spaces. These trivial extensions are omitted here for 
conciseness. 

VI. Conclusion 

In this paper, we derived conditions under which the Lp 
optimal estimator is linear. We identified the conditions for 
the existence and uniqueness of a source distribution that 
matches the noise in a way that ensures linearity of the optimal 
estimator, for the special case of p = 2. One trivial example of 
this type of matching occurs for Gaussian source and Gaussian 
noise at all SNR levels. Another instance of matching happens 
when the source and noise are identically distributed. We also 
showed that the Gaussian source-channel pair is unique in 
that it is the only pak for which the optimal estimator is 
linear at more than one SNR value. Moreover, we showed the 
asymptotic linearity of MSB optimal estimators at low SNR 
if the channel is Gaussian, regardless of the source, and vice 
versa, at high SNR if the source is Gaussian regardless of the 
channel. We also studied the extension to vector spaces where 
additional conditions are derived beyond those inherited from 
the scalar case, which concern interactions across components. 

Appendix A 
Proof of Lemma[T] 

Proof: First, we show the sufficiency of the necessary 
conditions for Lp norm. Note that = is convex 
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for p > 2, i.e., > 0, Vx - {0}. We need to show 



> 0, for any r]{y) variation function. 



e=0 



-J[h{y) + eri{y)] 



ry2(y)$ {x~hiy))fxix)fziy-x)dxdy (51) 



All factors in the integral are non-negative and hence, 



32 £ 



J[h{y) + eTj{y)] 



> 0, for any ri{y). 



e=0 



Next, we show the uniqueness (in probabilistic sense) of the 
optimal estimator for even natural p. Assume hi{Y) and 
h2{Y) both satisfy ^ while V[hi{Y) h2(Y)] > 0, i.e., 
over a set of positive measure hi{Y) ^ h2{Y). Then, the 
following holds for any 7]{Y) 



{h,{Y)~h2{Y))p{X,Y) 
(53) 



p-2 



P{X, Y)^Y.^X- h,{Y)Y-^-^[X - h2{Y)Y 



(54) 



771 — 



Proposition. hi{Y)T^h2{Y) implies P{X,Y) > VX,y e 



To see this, we note that (53 1 is a simple factorization of 
the form 



^p-i _ = (A - B)P{A, B) 



(55) 



where B) is a polynomial. Now \f A ^ B, then the 

sign of left hand side equals to the sign of ^4 — S. Hence 

P[A,B) > 0. 

Next, plugging ri{Y) = hi{Y) — h2{Y) in ((52l, we obtain. 



E{[hi{Y)^h2{Y)]^l3{X,Y)} =0 (56) 
Since hi{Y) ^ h2iY) implies /3{X,Y) >OyX,Y eR, then 



(56 1 requires hi{Y) ~ h2{Y) almost everywhere, contradict- 



ing the hypothesis P [hi(Y) ^ h2{Y)\ > 0. 



Appendix B 
Proof of Theorem[T] 

The necessary and sufficient condition (j6]l can be rewritten as: 



(x - kyY-\fx{x)fz{y - x)dx \ v{y)dy = (57) 



for all admissible perturbation functions ri{y). This equality is 
achieved for all r]{y) if and only if the expression in braces 
vanishes almost everywhere. Hence, (|6| is satisfied if and only 
if: 

( {x-kyY-^fx{x)fz{y-x)dx^Q,a.e. (58) 



Applying the binomial expansion to the first factor 

p-i 



p-l 



{x - kyf-^ = 



and rearranging terms, we get 



i-kyrx 



m p—m—1 



(59) 



m=0 \ ™ / J 

(60) 



E{{[x-/i2(r)F-i-[x-/ii(y)f-i}r,(r)} = (52) p_i 

Note that 

[X^h2{Y)Y-'^[X-h,{Y)Y-^ 
where 



Let * denote the convolution operator, and rewrite ( 60 1 as 

E (^,^ * fziv)] = (61) 

Taking the Fourier transforrrj^ 



m J duj^' 



m=0 

differentiating in parts, 

p-i 



dojP^ 



-Fz{u) 



- 
(62) 



E 

m=0 



p-l 



m 



/=0 



/ / dujP 



duji 



interchanging summations, 

^ dP-^-'Fx{^) SFzjuj) - l\ 

^ diuP-^-i dui ^ \ m ' 

1=0 m=l ^ 

applying some combinatoric algebra. 



(63) 

= 
(64) 



E 



p - 1\ dP-^-^Fx{uj) d'Fziuj) 
dui^ 



I J dujP-^'^ 
(p-l-/)! 



1=0 

p-l 

y 

^-^^ (m — /)!(p — 1 — m)! 



(65) 



and substituting t — m — I, we get 
g ^p - l^^ c«f-i-'Fx(a;) 



(=0 



dujP-^-' 

p-i-i 



duj^ 



E 

t=0 



p-l-/ 



(-fc) 



(t+/) 



(66) 



Finally, noting that 

(l-k) 



p-i-i 



E 

t=0 



i-ky 



(67) 



we obtain that (10 1 is a necessary and sufficient condition. 



We note that all steps of the derivation were obtained as " if 
and only if " statements, hence the converse is automatically 
proved. 

^Note that the Fourier transforms exist due to the finite moments assumption 
stated in Section II. A. 
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Appendix C 
Formal Proof of Theorem 5 

Let h{y) — k[y + £,{y)] be the polynomial expansion of the 
optimal estimator where ^ (y) consists of terms with order only 
two or higher. Let us rewrite the optimal estimator. 



h{y) ^ k[y + ay)] ^ 



J xfx{x)fz{y - x) dx 
J fx{x)fz{y - x)dx 



(68) 



or 



k[y + ^iy)] J .fxix)fziy - x)dx ^ J xfx{x)fz{y - x)dx 

(69) 

Expressing the integrals as convolutions, we have 

k[y + ay)] [fx{y)*fz{y)] - [yfx{y)]* fz{y) (70) 

Taking the Fourier transform of both sides, we obtain 



jk 



d[Fx{uj)Fz{u 
dw 



-k[Fx{uj)Fz{uj)]*E{w) 
dFxioj) 



jFz{uj)- 



dw 



(71) 



where ^{uj) denotes the Fourier transform of ^(•). Plugging 
k = and dividing both sides by Fx{(^)Fz{uj) we have 



1 dFxji^) 
Fxi^jj) duj 

or more compactly, 

d 



1 



-C(^)+7 



1 dFz{Lu) 
Fz{uj) doj 



du! 1+7 



duj 



logF^^ii.) 



(72) 



(73) 



Where ^uj) - rp^ j[Fx(a.)Fz(a.)] ■ 

Now consider the setting where the source is Gaussian and 
7 — > oo. By applying the central limit theorem, we have 
F^{bj) — > Fx(aj) pointwise as 7 — > oo. Hence, C{^) ^ 
pointwise for all a; G M. But this implies 'E.{uj) — > 
(pointwise) and hence in the limit 7 — > 00, £,{y) — almost 
everywhere with respect to the density of y. Also, it follows 
from the same arguments that when the noise is Gaussian and 
7 -> 0, £,{y) = a.e. 

Appendix D 
Derivation-Vector Case 

Let us rewrite the MSB optimal estimator for the vector case: 
/ X f X fx(x) fziv — x) dx 

h{y) = r ; / V/V — ^ (74) 

J fx{x).fz{y - x)dx 
Plugging h{y) = Ky in ( [74| i we obtain, 

Ky j fx{x)fz{y -x)dx^ j xfx{x)fz{y - x) dx 

(75) 

Expressing the integrals as m-fold convolutions, we get 

Ky [fx [y] *.fz{y)] = [yfx {y)] * fziy) (76) 

Taking the Fourier transform of both sides, 

jKV [FxHFziu:)] = j Fz Fx H {11) 



and rearranging terms, we get 

{I-K)—^VFx{u:)^K- ^ 



Fx{uj) ' FzM 

Using WlogFxH = j^VFx(a;), 

V\ogFx{u:) = (I - K)-^KV log Fz{u}) 
Note that (see eg. p2|) 



hence we have 



VFziio) (78) 



K ^ RxiRx + Rzy^ 



(79) 



(80) 



(/ - K) ={Rx + Rz){Rx + Rz)-^ - Rx{Rx + Rz)-' 
=Rz{Rx + Rzr^ (81) 

and 

(/ - K)-^K = [Rz{Rx + Rz)-T'Rx{Rx + RzY^ 
— [Rz{Rx + Rz) ^] ^[Rx + Rz — Rz]{Rx + Rz) ^ 
^[Rz{Rx + Rzr^V -I 
= [{Rx + Rz)Rz^] - I 
= RxRz^+I-I 

= RxRz^ (82) 



plugging ( |82| l into ( |79| ) we obtain, 

W\ogFx{uj) = RxRz''^\ogFz{i^) (83) 

Using the eigen decomposition of RxRz^^ = U^^AU where 
A is diagonal with eigen values Ai, A„, we obtain 



UVlogFxiio) = AUVlogFziio) 



(84) 



Similar to the scalar case, we can show the converse by 
retracing the steps in the derivation of the necessity. Note that 



none of these steps, (74i-([84b, introduce any loss of generality. 



hence retracing back from ( [84) l to (74i, we show that if (84i 
is satisfied, the optimal estimator is linear. 

Appendix E 
Proof of Lemma[2] 



By the chain rule we have 
df{Ax) 



dxi 



^ df{Ax) d[Ax]k 



k=l 



d[Ax]k d[x]. 



^ df{Ax) d{[A]Jx) 



^ - d[Ax]k d[x 
^ dfiAx-' 



fe=i 



d[Ax]k 



[AU 



^Y.9kf{Ax)[A]k^ 



fe=i 



[A^VfiAx) 



(85) 

(86) 

(87) 

(88) 
(89) 



It follows from (89 1 that V^^f{Ax) = A^VfiAx). 
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